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ON THE EFFICIENT COMPUTATION OF RECURRENCE RELATIONS 


Recently much progress has been made in the formulation of parallel 
algorithms which compute the terms of a sequence (y.) defined by 

( 1 ) 


Yq 5 iven, 




The germinal point of this work is the now well-known "log-sum" algorithm 
which computes a. in flog 2 Nl parallel addition steps, given fN/2] 
processors. Here the underlying recurrence is 

^0 = ° 


V- 1 

' i<»l 


«i* 


i = l....,N; 


yiH is the desired result. 

Two apparently distinct generalizations of the log-sum algorithm 
have appeared, Kogge and Stone [l] have considered the case 



y^ ~ T(b^, g(a^, y^^-j))» i ~1»,««»N, 

where f is associative, g distributes over f, and there is a function h 
such that g(x,g(y,z)) = g(h(x,y),z). Seemingly restricted to first order 
recurrences, by a suitable mapping m^^ order recurrences are also treated. 
Heller [2] has studied the case 

yo“^ 

i-i 

^i ^j=o ®ij^j + f’i* ^ 

This problem is equivalent to the solution of a lower triangular linear 
system of equations. In this note we give an improved parallel algorithm 
for (3) and show a relationship between the two generalizations. 





Rewrite (3) as (I-L)y=h, where L is a strictly lower triangular 
Matrix, and I is the identity, y and h are (N + l)-vectors. Since 
- 0 , 

(I-L)‘^ - (I+L4-L^+*'*+l'^) 

where Thus we have the algorithm: 

jx^— h; LI— Ij; 

^or i*^o step 1 until m-1 ^ 



LI— (I+L^ )LI j; 

}y— (I+L^) ;LI — Cl+L^^Llj. 

The algorithm is sequential in i, end within the braces operations are 
performed concurrently. When completed, we have the desired y, and 
(I-L)“^ is stored in LI. LI may now be used to compute y' given h'. 

O 

It is easily shown that, with 0(N ) processors, the calculation may 

be done in m + 3m + 1 parallel steps of addition and multiplication. 

(We use the fact that matrix products may be computed in logarithmic 

time with sufficiently many processors.) The previous result required 

O(N^) processors and m^ + 4m + 2 operation steps. 

We now turn to the Kogge - Stone results. Rewrite (2) as 



^i * ^i-1 ® ^i‘ 

Here g is replaced by the binary operation S, and f by ®. Assume that 
• is associative, fl distributes over 6, and there is a fl' such that 
a 8 (b 0 c) = (a 8' b) 0 c. Letabe a symbol distinct from all others, 
and define o| x®x0 a=x,a8x = xaa a for all x. Define an 


2 



operator L on (N+1) -vectors by 


(Lz)^ = a 

(Lz)^ ® a^ fl ^ 


Then y = Ly • b by (2‘). It is observed that L is an additive operator 

N-fl 

since fi distributes over t and by the definition of a. Moreover, L' =a, 
since for any z, and i = 1,...,N+1, (L^z) = a and for l<j<i, (L^z)^ = 

(L(L^”^z)). = a. i (L^"^z). , = a. a a=a. Therefore, 

J J * J 

y » Ly e b = L(Ly • b) € b = L^y « (L « I)b 

= 0 (lN ^ j^N-1 

= (L^ a «••*« I)b 

jn ^-1 

» « I){L^ « I)--'(L e I)b. 

Since = (L^)L = L(L^), B' behaves as an associative operation, and so 

2^ i 

(L^ y)j = a, o<j<2^ 


Similarly, 


* Uj «• a- •••a- 8y._2i,2^j<N. 


,i+l 


(L^ y)j = a, o<j<2^‘^^ 


j<N, 


=[(a. 8 --fl*a._ 2 i,i) 

a*{aj_2i a- •••a- aj_2l+l+l )] 8 j<J 

.i+l 

and the "coefficients" of L may be computed from the "coefficients" 

2^ 

of L in one B' operation step. Thus an algorithm similar to the previous 
one may be used to compute y. If the operator (L^ "’•I) is not formed, 

the computation time is 0(log2N) with 0{N) pror.^^ssors. In fact, if 
y' * Ly' € b' , it is less efficient to direct“y apply (L^ •‘"® I) than 
to use the above method. 
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The general recurrence (1) may be written as y = L^y, where is 
a strictly lower triangular operator In the sense that, for any 2 , 
(L^z)^ Is Independent of 2 ^, 2^.^^. . . , 2 ^^. By an induction argument 
N'f 1 

is a constant operator, and so the solution may be found by 

y * for any 2 . The special cases (2) and (3) allow the simple 

computation of the powers of when L-jZ = Lz 9 b, and L is linear. 

Kung [3] has shown that for non-linear recurrences, it is not possible, 
in general, to decrease the computation time by more than a constant 
factor by the use of parallelism. 
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